NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Du, Hongzhe; Li, Weikai; Cai, Min; Saraipour, Karim; Zhang, Zimin; Lakkaraju, Himabindu; Sun, Yizhou; Zhang, Shichang (October 2025, Conference on Language Modeling (COLM))

Full Text Available
Rethink GraphODE Generalization within Coupled Dynamical System

Wan, Guancheng; Huang, Zijie; Zhao, Wanjia; Luo, Xiao; Sun, Yizhou; Wang, Wei (July 2025, ICML 2025)

Full Text Available
SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

Xu, Haike; Lin, Zongyu; Sun, Yizhou; Chang, Kai-Wei; Indyk, Piotr (July 2025, Proceedings of Machine Learning Research)

Full Text Available
Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics

Sun, Fang; Huang, Zijie; Wang, Haixin; Tang, Huacong; Luo, Xiao; Wang, Wei; Sun, Yizhou (June 2025, Transactions on Machine Learning Research)

Full Text Available
Future Matters for Present: Towards Effective Physical Simulation over Meshes

https://doi.org/10.1145/3690624.3709340

Luo, Xiao; Luo, Junyu; Jiang, Huiyu; Zhou, Hang; Xiao, Zhiping; Ju, Wei; Yang, Carl; Zhang, Ming; Sun, Yizhou (July 2025, ACM)

Full Text Available
Dynamic-Width Speculative Beam Decoding for LLM Inference

https://doi.org/10.1609/aaai.v39i23.34690

Qin, Zongyue; He, Zifan; Prakriya, Neha; Cong, Jason; Sun, Yizhou (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Large language models (LLMs) based on transformer architecture have shown outstanding performance across numerous real-world tasks. However, the autoregressive nature of these models makes the inference process slow and costly. Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens, which are then validated simultaneously by the larger model, achieving a speed-up of 1-2x. Although speculative decoding matches the same distribution as multinomial sampling, multinomial sampling itself is prone to suboptimal outputs, where as beam sampling is widely recognized for producing higher-quality results by maintaining multiple candidate sequences at each step.This paper explores the novel integration of speculative decoding with beam sampling. However, there are four key challenges: (1) how to generate multiple sequences from the larger model's distribution given drafts sequences from the small model; (2) how to dynamically optimize the number of beams to balance efficiency and accuracy; (3) how to efficiently verify the multiple drafts in parallel; and (4) how to address the extra memory costs inherent in beam sampling.To address these challenges, we propose dynamic-width speculative beam decoding (DSBD). Specifically, we first introduce a novel draft and verification scheme that generates multiple sequences following the large model's distribution based on beam sampling trajectories from the small model. Then, we introduce an adaptive mechanism to dynamically tune the number of beams based on the context, optimizing efficiency and effectiveness. Besides, we extend tree-based parallel verification to handle multiple trees simultaneously, accelerating the verification process. Finally, we illustrate a simple modification to our algorithm to mitigate the memory overhead of beam sampling.Experimental results show that our approach achieves a 1.5-1.9x speed-up and1.8-2.5x lower energy consumption compared to beam sampling, with no loss in downstream performance. Moreover, it can produce significantly higher-quality outputs than speculative decoding, while maintaining similar time, memory, and energy costs. In summary, our method offers a more efficient and effective inference process for LLMs.
more » « less
Full Text Available
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Qin, Zongyue; Hu, Ziniu; He, Zifan; Prakriya, Neha; Cong, Jason; Sun, Yizhou (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Full Text Available
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

https://doi.org/10.1609/aaai.v39i17.34033

Li, Weikai; Wang, Ding; Ding, Zijian; Sohrabizadeh, Atefeh; Qin, Zongyue; Cong, Jason; Sun, Yizhou (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA). HLS enables FPGA design with software programming languages by compiling the source code into an FPGA circuit. The source code includes a program (called ``kernel'') and several pragmas that instruct hardware synthesis, such as parallelization, pipeline, etc. While it is relatively easy for software developers to design the program, it heavily relies on hardware knowledge to design the pragmas, posing a big challenge for software developers. Recently, different machine learning algorithms, such as GNNs, have been proposed to automate the pragma design via performance prediction. However, when applying the trained model on new kernels, the significant domain shift often leads to unsatisfactory performance. We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE), that can be flexibly adapted to any GNN model. Different expert networks can learn to deal with different regions in the representation space, and they can utilize similar patterns between the old kernels and new kernels. In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision. To stably train the hierarchical MoE, we further propose a two-stage training method. Extensive experiments verify the effectiveness of the hierarchical MoE.
more » « less
Full Text Available
Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

Subramonian, Arjun; Kang, Jian; Sun, Yizhou (December 2024, NeurIPS 2024)

Full Text Available
How Do Large Language Models Perform in Dynamical System Modeling

https://doi.org/10.18653/v1/2025.findings-naacl.50

Luo, Xiao; Chen, Binqi; Wang, Haixin; Xiao, Zhiping; Zhang, Ming; Sun, Yizhou (January 2025, Association for Computational Linguistics)

Full Text Available

« Prev Next »

Search for: All records